Statistics Cheat Sheet 


Population 
The entire group one desires information about 


Sample 


A subset of the population taken because the entire population is usually too large to analyze 
Its characteristics are taken to be representative of the population 


Mean 
Also called the arithmetic mean or average 
The sum of all the values in the sample divided by the number of values in the sample/population 
и is the mean of the population; X is the mean of the sample 


Median 
The value separating the higher half of a sample/population from the lower half 


Found by arranging all the values from lowest to highest and taking the middle one (or the mean of the middle two if there 
are an even number of values) 


Variance 
Measures dispersion around the mean 
Determined by averaging the squared differences of all the values from the mean 
Variance of a population is о2 Can be calculated by subtracting the square of the mean from the 
average of the squared scores: 


(хом) х 
п п 

Variance of a sample is s; note the n-1 Can be calculated by: 

2 
X 
cee x 
2 (х-х) 2 n 
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Standard Deviation 
Square root of the variance 
Also measures dispersion around the mean but in the same units as the values (instead of square units with variance) 
о is the standard deviation of the population and 5 is the standard deviation of the sample 


Standard Error 


An estimate of the standard deviation of the sampling distribution—the set of all samples of size n that can be taken from a 
population 
Reflects the extent to which a statistic changes from sample to sample 


S 
For a mean, Re For the difference between two means, 
n 


Assuming equal variances 


T-test 


One-Sample 


T-test 


Tests whether the mean of a normally distributed population is different from a specified value 


Null Hypothesis (Ho): states that the population mean is equal to some value (uo) 
Alternative Hypothesis (На): states that the mean does not equal/is greater than/is less than uo 
t-statistic: standardizes the difference between Х and uo 

5 


п 
Read the table of t-distribution critical values for the p-value (probability that the sample mean was obtained by 
chance given uo is the population mean) using the calculated t-statistic and degrees of freedom. 
На: и>ио > the t-statistic is likely positive; read table as given 
На: и<ио > the t-statistic is likely negative; the t-distribution is symmetrical so read the probability as if the t- 
statistic were positive 
Note: if the t-statistic is of the ‘wrong’ sign, the p-value is 1 minus the p given in the chart 
Ha: ижио > read the p-value as if the t-statistic were positive and double it (to consider both less than and greater 
than) 
If the p-value is less than the predetermined value for significance (called а and is usually 0.05), reject the null 
hypothesis and accept the alternative hypothesis. 


t Degrees of freedom (df) = п-1 


Example: 

You are experiencing hair loss and skin discoloration and think it might be because of selenium toxicity. You decide to 
measure the selenium levels in your tap water once a day for one week. Your results are given below. The EPA 
maximum contaminant level for safe drinking water is 0.05 mg/L. Does the selenium level in your tap water exceed 
the legal limit (assume a=0.05)? 


Day Selenium Ho: и=0.05; Ha: и>0.05 
mg/L Calculate the mean and standard deviation of your sample: 
1 0.051 x =0.0508 
2 2050 ge D = _ 0.051-0.0508) + (0.0505 - 0.0508)? + etc... _ зем 
4 0.0516 asd б 
5 0.052 5= 552 = 9.56х10* 
6 0.0508 
7 0.0506 


= 


-u _ 0.0508 -0.05 
5 956x107 
vn 4 


Looking at the t-distribution of critical values table, 2.17 with 6 degrees of freedom is between p=0.05 and p=0.025. 
This means that the p-value is less than 0.05, so you can reject Ho and conclude that the selenium level in your tap 
water exceeds the legal limit. 


The t-statistic is: t = 


= 2.17 and the degrees of freedom are n-1 = 7-1 = 6 


Two-Sample 


Tests whether the means of two populations are significantly different from one another 
Paired 
Each value of one group corresponds directly to a value in the other group; ie: before and after values after drug 
treatment for each individual patient 
Subtract the two values for each individual to get one set of values (the differences) and use 
Uo = 0 to perform a one-sample t-test 
Unpaired 
The two populations are independent 
Ho: states that the means of the two populations are equal (ui=uz) 
На: states that the means of the two populations are unequal or one is greater than the other (и:#и2, u1>u2, u1<u2) 


t-statistic: 


assuming equal variances: І = assuming unequal variances: t = 


degrees of freedom = (п1-1)+(п2-1) 


Read the table of t-distribution critical values for the p-value using the calculated t-statistic and degrees of 
freedom. Remember to keep the sign of the t-statistic clear (order of subtracting the sample means) and to 
double the p-value for an На of wi#uz. 


Example: 

Consider the lifespan of 18 rats. 12 were fed a restricted calorie diet and lived an average of 700 days (standard 
deviation=21 days). The other 6 had unrestricted access to food and lived an average of 668 days (standard 
deviation=30 days). Does a restricted calorie diet increase the lifespan of rats (assume a=0.05)? 


и1=700, 51=21, п1=12; u2=668, 52=30, n2=6 

Ho: u1=u2 

На: ил>И2 (because we аге only asking if a restricted calorie diet increases lifespan) 

We cannot assume that the variances of the two populations are equal because the different diets could also affect 
the variability in lifespan. 


%,-X, 700 - 668 


Degrees of freedom = (пі-1)-(п2-1) = (12-1)+(6-1)=16 
From the t-distribution table, the p-value falls between 0.01 and 0.02, so we do reject Ho. The restricted calorie 
diet does increase the lifespan of rats. 


Chi-Square Test 
For Goodness of Fit 
Checks whether or not an observed pattern of data fits some given distribution 
Ho: the observed pattern fits the given distribution 
H,: the observed pattern does not fit the given distribution 
у? 


О-Е 
The chi-square statistic is: x = У Z (0 is the observed value and Е is the expected value) 


Degrees of freedom = number of categories in the distribution - 1 


Get the p-value from the table of x? critical values using the calculated у? and df values. If the p-value is less than a, the 
observed data does not fit the expected distribution. If p>a, the data likely fits the expected distribution 


Example 1: 

You breed puffskeins and would like to determine the pattern of inheritance for coat color and purring ability. 

Puffskeins come in either pink or purple and can either purr or hiss. You breed a purebred, pink purring male with a 

purebred, purple hissing female. All individuals of the F; generation are pink and purring. The F2 offspring are shown 

below. Do the alleles for coat color and purring ability assort independently (assume a=0.05)? 
Pink and Purring Pink and Hissing Purple and Purring Purple and Hissing 
143 60 55 18 


Independent assortment means a phenotypic ratio of 9:3:3:1, so: 

Ho: the observed distribution of F2 offspring fits a 9:3:3:1 distribution 

На: the observed distribution of F2 offspring does пої fit a 9:3:3:1 distribution 

The expected values are: 
Pink and Purring Pink and Hissing Purple and Purring Purple and Hissing 
155.25 51.75 51.75 17.25 


2 2 2 2 2 
х У ІВ _ (143 -155.25) + 60-51.75) + 65-5175) ‚ 98-17.25) -2519 


155.25 51.75 51.75 17.25 
df=4-1=3 
From the table of у? critical values, the p-value is greater than 0.25, so the alleles for coat color and purring ability do 
assort independently in puffskeins. 


Example 2: . 7 E 
You are studying the pattern of dispersion of king penguins and the diagram on the right . е е 
represents an area you sampled. Each dot is a penguin. Do the penguins display a uniform ps . 
distribution (assume a=0.05)? A 21 5 

. ej 
Ho: there is a uniform distribution of penguins : ee Й 
На: there is not a uniform distribution of penguins ain bs 


There are a total of 25 penguins, so if there is a uniform distribution, there should be 2.778 
penguins per square. There actual observed values are 2, 4, 4, 3, 3, 3, 2, 3, 1, so the x2 statistic is: 


2 2 2 2 2 
v= 5 (O-E) _ (1-2.778) +2 (2-2.778) z (3- 2.778) +2 (4-2.778) 
Е 2.778 2.778 2.778 2.778 


| =2.72 


df=9-1=8 
From the table of у? critical values, the p-value is greater than 0.25, so we do not reject Ho. The penguins do display a 
uniform distribution. 


Chi-Square Test 
For Independence 

Checks whether two categorical variables are related or not (independence) 

Ho: the two variables are independent 

Ha: the two variables are not independent 

Does not make any assumptions about an expected distribution 

The observed values (#1, #2, #3, and #4) are usually presented as a table. Each row is a category of variable 1 and each 
column is a category of variable 2. 


Variable 1 Totals 
Category x | Category y 
Variable 2 Categorya | #1 #2 #1+#2 
Category Б | #3 #4 #3+#4 
Totals #1+#3 Hott Hitt#oth3ttH, 


The proportion of category x of variable 1 is the number of individuals in category x divided by the total number of 


individuals | яні, } Assuming independence, the expected number of individuals that fall within category 
#,t+#,+#,+#, 
a of variable 2 is the proportion of category x multiplied by the number of individuals in category a 
| #,+#, |е нь) Thus, the expected value is: 
+, +#,+#, 
E- (#,+#,)(#,+#,) _ (row total)(column total) 
#,+#,+8#,+#, grand total 
Degrees of freedom = (r-1)(c-1) where ris the number of rows and с is the number of columns 
The chi-square statistic is still „2 - 5 (0 ~ 


Read the p-values from the table of y? critical values. 


Example: 
Given the data below, is there a relationship between fitness level and smoking habits (assume a=0.05)? 
Fitness Level 


Low Medium-Low | Medium-High High 
Never smoked 113 113 110 159 495 
Former smokers 119 135 172 190 616 
1 to 9 cigarettes daily 77 91 86 65 319 
> 10 cigarettes daily 181 152 124 73 530 
490 491 492 487 1960 


Ho: fitness level and smoking habits are independent 

H,: fitness level and smoking habits are not independent 

First, we calculate the expected counts. For the first cell, the expected count is: 
(row total)(column total)  (495)(490) 


E = = 123.75 
grand total 1960 
Fitness Level 
Low Medium-Low | Medium-High High 
Never smoked 123.75 | 124 124.26 122.99 
Former smokers 154 154.31 154.63 153.06 
1 to 9 cigarettes daily 79.75 | 79.91 80.08 79.26 
> 10 cigarettes daily 132.5 132.77 133.04 131.69 
2 2 2 2 
У (0-Е)? _ (113-123.75)° (113-124) (10-12426) оу, 
Е 123.75 124 124.26 


df=(r-1)(c-1)=(4-1)(4-1)=9 


From the table of у? critical values, the p-value is less than 0.001, so we reject Ho and conclude that there is а 
relationship between fitness level and smoking habits. 


Type | error 


The probability of rejecting a true null hypothesis 
Equals a 


Type Il error 
The probability of failing to reject a false null hypothesis 


Probability 


Joint Probability 


The probability of events A and B occurring 
P(A and B) = P(A) x P(B) when events A and B are independent 


Union of Events 


The probability of either event A or event B occurring 
P(A ог В) = P(A) + P(B) - P(A апа В) 


Conditional Probability 
The probability of event A occurring given that event B has occurred 


PC pet oe) 2 P(A |B)  РВ!А) ХРА) 
Р(В) РІВ) 
Chances ої 
finding ап А 


Allpossible outcomes outcome in all 


the B outcomes 


Example 1: 
Assume that eye color is an autosomally inherited trait controlled by one gene with two alleles. Brown is dominant to 


blue. A brown-eyed man with genotype Bb and a blue-eyed woman have three children. The first has blue eyes. What 
is the probability that all three children have blue eyes? 


Without considering the first child, the probability that the couple has three children with blue eyes is 
0.5 х 0.5 x 0.5 =0.125 = P(A and В) = P(2 children = bb and Ist child bb) 
With his parents, the probability that the 1st child is bb is: P(B) = Р(15: child = bb) =0.5 


Therefore PO. iden = bb) ево 2 PCBS а D 18 > 0.25 


Example 2: 

Based оп an analysis of her pedigree, it is determined that a woman has а 70% chance of being Zz and a 30% chance of 
being ZZ for a sex-linked trait, where Z is dominant to z. If she now has a son with the Z phenotype, what is the 
probability of her being Zz? 


We're looking for: P(W=Zz|S=Z) 
But it’s hard to find P(W=Zz and S=Z) because the two events are not independent. Instead, let us use: 
P(AIB) = Р(В ІА) x P(A) 
P(B) 
Р(5 = 210 = Zz) = 0.5 (50% chance of passing on the 7 allele) 
Р( = 22) =0.7 (given) 
P(S = Z) = (0.7 x 0.5) + (0.3 x 1) = 0.65 (son can be 7 from the woman being either Zz or ZZ) 


PW «7215 =Z) = 09207 _ 0.538 
0.65 


Multiple Experiments 


Binomial distribution 
For when you are not concerned about the order of the events, only that they occur 
піхр" х(1- р)" 
P(X = т) = 
т!х(п — т)! 
for т outcomes of event X in n total trials with p=probability of X occurring once 
Example: 
What is the probability that a couple has one boy out of five children? 


! 1 4 
И Е 


11(4)! 


Poisson distribution 


The binomial distribution works for a small number of trials but as n gets too large, the factorials become 
unwieldy. 
The Poisson distribution is an estimate of the binomial distribution for large n. 


Peis КВ. 


т! 
Note: пр is also known as the number of expected outcomes for event X 


— 
О о ос 4 сіл ьо њо н | В 


— 
— 


på =e jd j 
Ee a Ба 


50% 


TABLE В: 7-ОІЅТЕІВОТІОМ CRITICAL VALUES 


Tail probability p 


.05 


6.314 
2.920 
2.353 
2.132 


' 2.015 


1.943 
1.895 


1,860. 


1,833 
1.812 
1.796 
1.782 
1.771 


1761 


1.753 
1.746 
1.740 
1.734 
1.729 
1.725 
1.721 
1.717 
1.714 
171 
1.708 
1.706 
1.703 
1.701 
1.699 
1.697 
1.684 
1.676 
1.671 
1.664 
1,660 
1.646 
1.645 


90% 


025 


12.71 
4303 
3.182 
2.776 
2.571 
2.447 
2.365 
2.306 
2.262 
2.228 
2,201 
2.179 
2.160 
2.145 
2.131 
2.120 
2.110 
2.101 
2.093 


95% 


02 


15,89 
4,849 
3482 
2.999 
2.757 
2.612 
2.517 
2.449 
2.398 
2,359 
2,328 


96% 


Confidence level C 


a 
= 


О оо 4 л рь 5 ә a 


109.1 


“90.41 


111.7 


-15 


2.07 

3.79 

5.32 

6.74 

8.12 

9.45 
10.75 
12.03 
13.29 
14.53 
15.77 
16.99 
18.20 
19.41 
20.60 
21.79 
22.98 
24.16 
25.33 
26.50 
27.66 
28.82 
29.98 
31.13 
32.28 
33.43 
34.57 
35.71 
36.85 
37.99 
49.24 
60.35 
71.34 
93.11 

1147 


410 
2.71 
4.61 
6.25 
7.18 
9.24 

19.64 
12.02 
13.36 
14.68 
15.99 
17.28 
18.55 
19.81 
21.06 
22.31 
23.54 
2477 
25.99 
27.20 
28.41 
29.62 
30.81 
32.01 
33.20 
34.38 
35,56 
36.74 
37.92 
39.09 
40.26 
51.81 
63.17 
74.40 
96.58 
118,5 


33.92 
35.17 
36.42 
37.65 
38.89 
4041 
41.34 
42.56 
43.77 
5576 
67.50 
79.08 
101.9 
124.3 


39.36 
40.65 
41,92 
43.19 
44.46 
45.72 
46.98 
59.34 
71,42 
83.30 
106.6 
129.6 


108.1 
131.1 


112.3 
135.8 


У? CRITICAL VALUES 


005 


“788 
10.50 
12.84 
14.86 
16.75 
18.55 
20.28 
21.95 
23.59 
25.19 
26.76 
28.30 
29.82 
31.32 
32.80 
3427 
35.72 
37.16 
38.58 
40.00 
41.40 
42.80 
44.18 
45.56 
46.93 


48.29 


49.64 
50.99 
52.34 
53.67 
66.77 
79.49 
91.95 

1163 

1402 


95.34 
120.1 
144.3 


001 


10.83. 
13.82 
16.27 
18.47 
20.51 
22.46 
24.32 
26.12 
27,88 
29,59 
31.26 
32.91 
34.53 
36.12 
37.70. 
39.25 
40.79 
42.31 
43.82 
45.31 
46.80 
48.27 
49.73 
51,18 
52.62 
54.05 
55.48 
56.89 
58.30 
59.70 
73.40 
86.66 
99.61 
124.8 
149.4 


